Identifying and Characterizing Nodes Important to Community Structure Using the 

Spectrum of the Graph 
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Background: Many complex systems can be represented as networks, and how a network breaks 
up into subnetworks or communities is of wide interest. However, the development of a method 
to detect nodes important to communities that is both fast and accurate is a very challenging and 
open problem. 

Methodology/Principal Findings: In this manuscript, we introduce a new approach to char- 
acterize the node importance to communities. First, a centrality metric is proposed to measure the 
importance of network nodes to community structure using the spectrum of the adjacency matrix. 
We define the node importance to communities as the relative change in the eigenvalues of the 
network adjacency matrix upon their removal. Second, we also propose an index to distinguish two 
kinds of important nodes in communities, i.e., "community core" and "bridge". 

Conclusions/Significance: Our indices are only relied on the spectrum of the graph matrix. 
They are applied in many artificial networks as well as many real-world networks. This new method- 
ology gives us a basic approach to solve this challenging problem and provides a realistic result. 

PACS numbers: 89. 75. He, 89.75.-k, 89.75.Fb 



I. INTRODUCTION 

Networks, despite their simplicity, represent the inter- 
action structure among components in a wide range of 
real complex systems, from social relationships among 
individuals, to interactions of proteins in biological sys- 
tems, to the interdependence of function calls in large 
software projects. The network concept has been de- 
veloped as an important tool for analyzing the relation- 
ship of structure and function for many complex systems 
in the last decades Many real- world systems show 

the existence of structural modules that play significant 
and defined functional roles, such as friend groups in so- 
cial networks, thematic clusters on the world wide web, 
functional groups in biochemical or neural networks[^. 
Exploring network communities is important for the rea- 
sons listed below 0: 1) communities reveal the network 
at a coarse level, 2) communities provide a new as- 
pect for understanding dynamic processes occurring in 
the network and 3) communities uncover relationships 
among the nodes that, although they can typically be 
attributed to the function of the system, are not appar- 
ent when inspecting the graph as a whole. As a result, 
it is not surprising that recent years have witnessed an 
explosion of research on community structure in graphs, 
and a huge number of methods or techniques have been 
designed 6, 8-,17j(seef9j as a review). 

It is believed that community structure is important 
to the function of a svstem |18l - [2(]| |. In many situations, 
it might be desirable to control the function of modular 
networks by adjusting the structure of communities. For 
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example, in biological systems, one might like to identify 
the nodes that are key to communities and protect them 
or disrupt them, such as in the case of lung cancer fioi]. 
In epidemic spreading, one would like to find the impor- 
tant nodes to understand the dynamic processes, which 
could yield an efficient method to immunize modular 
networks [20i. Such strategies would greatly benefit from 
a quantitative characterization of the node importance to 
community structure. Some important work related to 
this topic has been proposed. In 2006, Newman proposed 
a community-based metric called "Community Central- 
ity" to measure node importance to communities [8]. His 
basic idea relies on the modularity function Q. Those 
vertices that contribute more to Q are more important 
for the communities than those vertices that contribute 
less. Kovacs et al. also proposed an influence function to 
measure the node importance to communities f22l|. 

In fact, the important nodes can have distinct func- 
tions with respect to community structure. Some pre- 
vious studies have also revealed such classifications. 
Guimera et al. have proposed a classification of the 
nodes based on their roles within communities, us- 
ing their within-module degree and their participation 
coefficient [2l|. They divided the hubs into three cate- 
gories: provincial hubs, connector hubs and kinless hubs. 
Other approaches have also been suggested to discuss 
the connection between nodes and modularity in biolog- 
ical networks, by dividing hub nodes into two categories 
called "party hubs" and "date hubs"[2l-i|. When re- 
moved from the network, party and date hubs have strik- 
ingly distinct effects on the overall topology of the net- 
work. Recently, Kovacs et al. proposed an interesting 
approach. They introduced an integrative method fam- 
ily to detect the key nodes, overlapping communities and 
"date" and "party" hubs 123]. In a very recent work, the 
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authors mentioned that modular networks naturally al- 
low the formation of clusters, and hubs connecting the 
modules would enhance the integration of the whole net- 
work, such as in the case of neuron networks[26j. As a 
result, it is intuitive that nodes that are important to 
communities can be divided into "community cores" and 
"bridges" . However, there is one problem. Before using 
the participation coefficient and the influence function to 
distinguish these two kinds of vertices, the exact commu- 
nities of the network must first be given. In contrast, it 
is interesting to characterize node importance to commu- 
nities before the division of the network. 

It is understood that the adjacency matrix contains 
all the information of the network. Developing methods 
based only on the adjacency matrix of the network to 
detect important nodes to communities and then distin- 
guish them as either "community core" or "bridge" is an 
interesting and important problem in network research. 
In this manuscript, based only on the adjacency matrix 
of the network, we try to access the fundamental ques- 
tions: how to evaluate the node importance to communi- 
ties and how to distinguish different kinds of important 
nodes? It is implied that in many cases the spectrum of 
the adjacency matrix gives an indication of the commu- 
nity structure in the network [23|- If the network has c 
strong communities, the c largest eigenvalues of the adja- 
cency matrix are significantly larger than the magnitudes 
of all the other eigenvalues. These large eigenvalues are 
key quantities to the community structure. For this rea- 
son, we suggest a basic approach to solve the above open 
problem using the spectrum of the graph. We define 
the importance of nodes to communities as the relative 
change in the c largest eigenvalues of the network ad- 
jacency matrix upon their removal. Furthermore, using 
the eigenvectors of the graph Laplacian, we divide the 
important nodes into community cores and bridges. We 
apply our method to many networks, including artificial 
networks and real-world networks. This new methodol- 
ogy gives us a basic approach to solve this challenging 
problem and provides a realistic result. 

The organization of this paper is as follows. In section 
II, the centrality metric identifying the important nodes 
to communities is proposed using the spectrum of the 
adjacency matrix. An index to distinguish the two kinds 
of important nodes using the corresponding eigenvector 
of the graph Laplacian is introduced in section III. In 
section IV, our method is applied to artificial networks 
and some real- world networks, and we obtain some in- 
teresting results. In section V, we extend our method 
into weighted networks. Finally, concluding remarks are 
presented in section V. 



II. CENTRALITY METRIC BASED ON THE 
SPECTRUM OF THE ADJACENCY MATRIX 

We consider a binary network G = (F, E) with N 
nodes. The adjacency matrix A is the matrix with el- 



ements Aij = 1 if there is an edge joining vertices i 
and j, otherwise 0. We denote each eigenvalue of A 
by A and the corresponding eigenvector by v, such that 
Av = \v. The eigenvector is orthogonal and normalized. 
The eigenvalues are ordered by decreasing magnitude: 
Ai > A2 > ■ • • > A„. It is easy to show that A is sym- 
metric and the eigenvalues of A are real. Consider the 
case of networks that have c communities. It is implied 
that when these communities are disconnected, each one 
has its own largest eigenvalues. With proper labeling of 
the nodes, the matrix A will have a block matrix struc- 
ture with c X c blocks. Blocks on the diagonal correspond 
to the adjacency matrices of the individual communities, 
while the off-diagonal blocks correspond to the edges be- 
tween communities; in other words, we can consider them 
as a perturbation. Therefore, A can be written as 

A^A^ + 5A, (1) 

where Aq is a matrix whose diagonal block elements are 
the diagonal block elements of A and whose off-diagonal 
block elements are zeros, while 5 A is a matrix with zeros 
on its diagonal blocks and with the off-diagonal blocks of 
A as its off-diagonal block elements. Chauhan et al.[27j 
have proved that if the perturbation strength is small, 
the largest eigenvalues of disconnected communities are 
perturbed more weakly than the perturbation applied. 
The spectrum of the adjacency matrix of a network gives 
a clear indication of the number of communities in the 
network. If the network has c strong communities, the c 
largest eigenvalues are well separated from others. These 
eigenvalues are key quantities to the community struc- 
ture. 

For this reason, we define the importance of node k to 
communities as the relative change in the c largest eigen- 
values of the network adjacency matrix upon its removal: 

where c is the number of communities. To avoid the 
computational cost, we use perturbation theory to pro- 
vide approximations of Ik in terms of the corresponding 
eigenvector v. Let us denote the matrix before the re- 
moval of the node by A and the matrix after the removal 
by A -I- AA; the eigenvalue of this matrix is A -I- AA, and 
the corresponding eigenvector is /S.v. For large matri- 
ces, it is reasonable to assume that the removal of a node 
has a small effect on the whole matrix and the spectral 
properties of the network, so that IS.A and AA are small. 
We obtain 

(A-H AA)(t;+ At;) = (A + AA)('(; + Av). (3) 

The effect on the adjacency matrix A of removing node 
k is given by {l^A)ij = —Aij{Sik + Sjk)- We cannot as- 
sume that the At; is small because Avk = —Vk, so we set 
At; = Sv — VkCk where dv is small and e is the unit vec- 
tor for the k component. Left multiplying (3) by t;^ and 



3 



neglecting second order terms v'" AASv and v^AXSv, we 
obtain 



AA 



v^AAv - v'^VkAAck 



V — vt 



(4) 



For a large network {N 3> 1), we know that v^v:^ w^; 



therefore, we can write 

v^AAv- v^VkAAsk 

/AA ~ TP 



(5) 



Because {AA)ij = —Aij{Sik + Sjk), we obtain 

v^AAv = -~2Xvl, v^VkAAek = -Aw^.. (6) 

Finally, the importance of node k to the community 
structure is obtained by 



A 



(7) 



where c is the number of communities, is the fcth 
element of Vi and /fc lies in the interval [0, 1]. If Ik is 
large, node fc is important to the community structure; 
otherwise, k is on the periphery of the community. 

Using this metric /, we can quantify the node impor- 
tance to the community structure. If the node is im- 
portant to the community structure, when we remove it 
from the network, the relative changes of the c largest 
eigenvalues are large; otherwise, the changes are small. 
Before applying /, the value of c needs to be determined. 
The determination of the number of communities is an 
important but challenging question in community analy- 
sis. Here we use the method proposed by Ref.[23|. This 
method is based on the properties of the spectrum of the 
graph and is independent of the partition algorithms, so 
our metric is quite convenient to use. 



crucial [26|. It is clear that effectively disconnected and 
fully non-synchronous regions cannot allow collective or 
integrative action of the elements. Similarly, a fully syn- 
chronized regime does not allow separated or segregated 
performance of the elements. Therefore, both situations 
are biologically unrealistic, as can be seen from the exis- 
tence of related conditions, such as epileptic seizures (col- 
lective phenomena) and Parkinson's disease (segregated 
phenomena) [HI . For this reason, both the "community 
core" and the "bridge" are important to communities, 
but they play different roles. The metric we proposed in 
Sectionll can determine the nodes that are important to 
communities, but now a method to distinguish these two 
kinds of important nodes is needed. 

In agreement with earlier findings [22| - |25j . we assumed 
that bridge nodes should have more inter-modular po- 
sitions than community cores. The existence of bridge 
nodes often leads to some inter-modular edges. Given a 
graph, the simplest and most direct way to construct 
a partition of the graph is to solve the mincut prob- 
lernjminimize the number of edges between communities 
^)I23- In practice, however, this method often does not 
lead to satisfactory partitions. The problem is that, in 
many cases, the solution of mincut simply separates one 
individual vertex from the rest of the graph. Of course, 
this is not what we want to achieve in clustering, as clus- 
ters should be reasonably large groups of points. Due 
to this shortcoming in the mincut problem, one common 
objective function to encode the desired information is 
RatioCutf30l|: 



RatioCut{Ci,---Cc) 



i=l 



R{Ci,Ci) 
\C,\ ' 



(8) 



where | Ci \ is the size of community Ci . If the sizes of the 
communities are almost the same, the RatioCut problem 
reduces to the mincut problem. 



III. DISTINGUISH TWO KINDS OF 
IMPORTANT NODES 

As mentioned above, there are two kinds of nodes that 
are important to communities. One is the "community 
core", and the other is the "bridge" between commu- 
nities. Each will affect communities deeply upon its re- 
moval. When we remove the "community core" , the com- 
munity structure in the network will become fuzzy, while 
the community structure will become clear when we re- 
move the "bridge" . See Fig. 1 for an example. Vertices 
1 and 8 are the "community cores" , and they organize 
their respective communities. Meanwhile, node 15 is the 
"bridge" between the two communities. The "community 
core" is the leader in the community, and it can organize 
the function of each community. In contrast, the "bridge" 
connects the modules and can enhance the integration of 
the whole network. It is believed that a combination of 
both segregation and integration, as in neural systems, is 



A. The Condition of c = 2 

If the network is divided into only two communities 
(c = 2), we define an index vector s with N elements: 

J ^\C\/\C\ if vertex i e C, 
\ if vertex ieC. 

Then the RatioCut function is obtained as follows ISlI: 



RatioCut{C,C)^ —s^Ls, (10) 

where \V\ is the number of vertices in the network and 
L is the graph Laplacian. L is defined as = —Aij for 
i ^ j and L^^ = ki, where ki is the degree of node i. We 

n n 

also have two constraints on s: Si — Q and sj ^ n. 

i=l i=l 
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Here the partition problem is equal to the problem 



min s^Ls; subject to Sj 



i=l 



n. 



(11) 



If the components of the vector s are allowed to take 
arbitrary values, it can be seen immediately that the so- 
lution of this problem is given by the vector s that is the 
eigenvector corresponding to the second-smallest eigen- 
value of L, denoted by 1x2- So we can approximate a 
minimizer of RatioCut by the second eigenvector of L. 
Unfortunately, the components of s are only allowed to 
take two particular values. 

Thus, the simplest solution is achieved by assigning 
vertices to one of the groups according to the sign of 
the eigenvector v^. In other words, we assign vertices as 
follows: if ti^ > 0, we assign vertex i to community C; 
otherwise, we assign it to (7. Assignation priority begins 
with the most positive and the most negative; the node 
with the most positive magnitude is first to be assigned 
to C, then the second and so on, while the node with the 
most negative magnitude is similarly the first to be as- 
signed to C. If a node's corresponding element is close to 
zero, it may have nearly equal membership in both com- 
munities, and we can assign it to both communities. In 
conclusion, if the network is divided into only two com- 
munities, we can use this method to characterize which 
are the "community cores" and which are the "bridge" 
between communities. If node i is a "community core" , 
1^41 is relatively large; otherwise, 1^41 is near zero. 



B. The Condition of c> 2 

Consider the division of a network into c nonoverlap- 
ping communities, where c is the number of communities. 
We define an n x c- index matrix S with one column for 
each community, 5 = (si|s2| • • • |sc), by 




if vertex i € Cj, 



otherwise. 

Following the previous section, we obtain 
RatioCut = Tr{S^ LS), 



(12) 



(13) 



where Tr is the trace of a matrix and S is the transpose 
matrix of S. L is a semi-positive and symmetric matrix. 
We can write L = UD [/' , where U is the eigenvector of 
L, U = {ui\u2 \ ■ ■ ■ \ tin) and D is the diagonal matrix of 
eigenvalues Da = We therefore obtain 



RatioCut 



It can also be written as 

c n n 

RatioCut = J2Y1 E U,^jSikf 



(14) 



(15) 



Now we define the vertex vector of i as Vi, and let 

[''ilj ~ ^ij- 



(16) 



If the network has almost equal-sized communities, then 
equation (15) can be written as 



RatioCut 



E E/ 
fe=ij=i 



M E [nU 



\C\ 



(17) 



where Gk is the set of vertices belonging to community 
k and |C| is the community size. 

Minimizing the RatioCut can be equated with the task 
of choosing the nonnegative quantities so as to place as 
much of the weight as possible in the terms corresponding 
to the low eigenvalues and as little as possible in the terms 
corresponding to the high eigenvalues. This equates to 
the following maximization problem: 



Max EE/3,[^ 



(18) 



fe=i j=i 



ieGk 



where p is a parameter. We could choose p = c if the 
community structure was clear. To this end, we propose 
an easy way to distinguish two kinds of important nodes 
using the theory of the graph Laplacian. If the com- 
munity structure is quite clear, we focus on the vertex 
vector magnitude \ri\ in the first p terms, denoted by the 
w-score: 



Wi = 



.7 = 1 



(19) 



If the w-score of a given vertex is close to zero, we believe 
that this vertex has nearly equal membership in more 
than one community, and it is likely to be the "bridge" of 
these communities. This discrimination process equates 
to the "fuzzy" division of the network into communities. 
In many cases, this type of fuzzy division could result in 
a more accurate picture of real- world networks. 



IV. RESULTS 

Now we test the validity of our indices introduced in 
section II and section III in various artificial networks 
and real-world networks. 



A. Artificial Networks 

First, we consider a sketch composed of 15 nodes (see 
Fig. 1) forming two communities. It is intuitive that ver- 
tices 1, 8 and 15 are important to the community struc- 
ture in this sketch. Vertices 1 and 8 are the so-called 
"community cores" , and they form both the communi- 
ties. Vertex 15 is the "bridge" between communities. 
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TABLE I: Centrality metrics of the example sketched in Fig 
1. 



Vertex Label 


I 


M 


AH 


lu-score 


1 


0.32 


0.758 


-0.145 


0.2405 


8 


0.32 


0.758 


-0.145 


0.2405 


15 


0.173 


0.69 


0.116 


0.00 


2,7,9,14 


0.09 


0.704 


0.04 


0.198 


3,6,10,13 


0.1 


0.7535 


-0.021 


0.285 


4,5,11,12 


0.105 


0.7327 


-0.054 


0.3175 



FIG. 1: Sketch of a network composed of 15 nodes. The di- 
ameter of one vertex is proportional to the centrality metric 
/. Moreover, the color of one vertex is related to the in- 
dex w-score. Red vertices behave like "overlapping" nodes or 
"bridges" between communities, and yellow vertices often lie 
inside their own communities. 

and it connects these two communities. As we discussed 
before, removing vertex 1 or 8 will make the commu- 
nity structure fuzzy, and removing vertex 15 will make it 
clear. Here we use the index H proposed by Hu et al.[T3| 
to measure the significance of communities: 

H=^^ , (20) 

h ^ 1 



where /3 is the eigenvalue of the graph Laplacian, /? is the 
average value of /32 through /3c, A: is the average degree of 
the network and n is the number of vertices in the net- 
work. In networks with strong communities (many links 
are within communities with very sparse connections out- 
side), H is always large. Here we focus on the change of 
H due to the removal of vertices, denoted by /S.H . Wc 
also use the centrality metric proposed by Newman 8], 
which we denote here by M. The results are shown in 
Tab. 1. Through Ai7, it is implied that vertices 1 and 8 
are more important than other vertices because the mag- 
nitude of Ai/ is relatively larger than others. Moreover, 
their removal makes the communities fuzzy, while vertex 
15 acts Hke a "bridge" between the communities, and its 
removal makes the communities clear. We can see that 
our centrality metric performs quite well; it can identify 
not only the "community cores", but also the "bridge" 
between communities. M can also identify the "commu- 
nity cores" , but it has some problems. One issue is that 
its values tend to span a rather small dynamic range from 
largest to smallest. Moreover, in some cases (such as this 
sketch), M cannot recognize important vertices among 
communities. In calculating the index iJ, we need to go 
through every vertex in the network, incurring significant 
computational cost. In contrast, our method provides a 
more efficient way, requiring less computational cost, and 
yields the correct answer. 



Here we use the classical GN benchmark presented 
by Girvens and Newman to test the measurements . 
Each network has N = 128 nodes that are divided into 
four communities (c — 4) with 32 nodes each. Edges 
between two nodes are introduced with different proba- 
bilities, which depend on whether the two nodes belong 
to the same community or not. Each node has < kin > 
links on average with its fellows in the same community 
and < kout > hnks with the other communities, and we 
impose < kin > + < kout >= 16. The communities be- 
come fuzzier and thus more difhcult to identify as ko^t 
increases. Because the GN benchmark is a homogenous 
network, there should not be any nodes that are impor- 
tant to the community structure. To check whether our 
conjecture is correct or not, we let < kin >= 12 so that 
the community structure is quite clear and average the 
result for the GN benchmark over 100 configurations of 
networks. From the result, about 120 nodes' importances 
lie in the interval [0.03,0.04], while others lie in the in- 
terval [0.02,0.03]. The mean value of / is 0.0312, and 
the standard deviation is 0.0014. It can be concluded 
that, in the GN benchmark, there are no nodes that are 
important to the community structure. 

We may also test the method on the more challenging 
LFR benchmark presented by Lancichinetti et al.j3^. 
In the LFR benchmark, the degree distribution obeys a 
power-law distribution p{k) oc fc~", and the sizes of the 
communities are also taken from a power-law distribu- 
tion with an exponent 7. Moreover, each node shares 
a fraction 1 — ^ of its links with other nodes of its own 
community and a fraction /i with others in the rest of the 
network. The community structure can be adjusted by 
the mixing parameter /x. Without loss of generality, we 
let a — 2.5, 7 = 1.0, /i = 0.25 and the size of the network 
N = 1000. Our numerical results in the LFR benchmark 
are shown in Fig. 2. In this case, there is no "bridge" 
between communities because /i = 0.25. We may also 
calculate the w-score, of which the mean value is 0.1736 
and the standard deviation is 0.0292. Moreover, the cen- 
trality metric is positively correlated with node degree 
(r^ — 0.7329), but some vertices have quite high cen- 
trality while having relatively low degree, and thus the 
correlation index is not very high. 
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^0.2 
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^ 

Instructor 
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0.2 0.4 0.6 0.! 

t(;-score 



FIG. 2: (a) The Zipf plot of the nodes' centrahty to commu- 
nities, (b) The centrahty metric we propose is correlated with 
node degree. The parameters in the LFR benchmark are as 
follows: a = 2.5, 7 — 1.0, /i = 0.25 and the size of the network 
iV = 1000. 



Real-world Networks 



FIG. 3; It is shown that our method works quite well in the 
Zachary's karate club network. Nodes 1 and 34 are the in- 
structor and the administrator, respectively. In Fig. 3(a), 
we can see that these two nodes are more important to the 
community structure than other nodes. We also compare our 
method with Newman's and find that the two methods ex- 
hibit some differences. In Fig. 3(b), we shown that nodes 1 
and 34 are the so-called "community cores" . 



We apply our method to some real- world networks, 
such as the Zachary club network [sll, the word associa- 
tion network [s^l- the scientific collaboration network [ssj. 
and the C. elegans neural network [so']. 

First, we consider a famous example of a social net- 
work, the Zachary's karate club network. This net- 
work represents the pattern of friendships among mem- 
bers of a karate club at a North American university. It 
contains 34 vertices, and the links between vertices are 
the friendships between people. The nodes labeled as 1 
and 34 correspond to the club instructor and the admin- 
istrator, respectively. They had a conflict which resulted 
in the breakup of the club. Most other nodes have a rela- 
tionship with node 1, node 34, or both. In this network, 
c = 2. The numerical results are shown in Fig. 3 and 
Fig. 4. In Fig. 3(a), we can see that nodes 1 and 34 
are the most important nodes in the communities. Our 
method to distinguish important nodes are shown in Fig. 
3(b). From the result, we can see that nodes 1 and 34 
are the so-called "community cores" , and they have many 
connections in their own communities. Furthermore, we 
compare our method with Newman's. This result is also 
shown in Fig. 3(a), and the two metrics are normalized 
by 



< X > 



(21) 



where < a; > is the average value of each index and ax is 
the standard deviation of each index. It is implied that 
these two methods have some differences. In our method, 
nodes 1 and 34 are absolutely more important than other 
nodes, while in Newman's method, nodes 2 and 33 are 
also quite important, even more than node 1. In this net- 
work, the modularity function Q reaches its maximum 
value when the network is divided into 4 communities; 
this fact may be the cause of the differences between the 




0.02 0.04 0.05 0.08 0.10 0.12 0.14 0.15 0.18 0.20 

it'-score 



FIG. 4: The Zachary's karate club network, which is com- 
posed of 34 vertices. Vertex diameters indicate the commu- 
nity centrahty /. The color of each vertex is proportional to 
the index w-score. 



results of these two methods. The visualization of the 
karate network with our two measurements is sketched 
in Fig. 4. The diameter of each vertex is proportional 
to the centrahty metric /. A large diameter indicates an 
important vertex. Additionally, the color of each vertex 
is related to the index w-score. Red vertices behave like 
"overlapping" nodes or "bridges" between communities, 
and yellow vertices often lie inside their own communi- 
ties. 

Second, we analyze the word association network 

starting from the word "Bright" . This network was 
built on the University of South Florida Free Associa- 
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FIG. 5: Index I and oj-score for the nodes of the word asso- 
ciation network. The node importance versus vertex rank is 
shown in (a). In (b), we distinguish "community cores" and 
"bridges" using the index w-score. 



tion Norms |34|. An edge between words A and B indi- 
cates that some people associate the word B to the word 
A. The graph displays four communities, corresponding 
to the categories Intelligence, Astronomy, Light, Colors. 
The word Bright is related to all of them by construc- 
tion. We applied our method to this network, and the 
results are shown in Fig. 5. From the results, we can 
observe that our method considers Bright, Sun, Smart, 
Moon as important nodes to the community structure. 
It may be inferred from the result that Moon and Smart 
are the "community cores", while Bright and Sun are 
the "bridges" between communities. Indeed, our metric 
yields the correct answer. For example, Smart is the core 
of the community Intelligence, while Moon is the core of 
the community Astronomy. Meanwhile, the w-score of 
node Bright is 0.08, which is close to zero. We would 
therefore conclude that it is a "bridge" between commu- 
nities, and Bright is in fact the "bridge" among these 
four communities, as the network was originally derived 
from it. 

Moreover, we may investigate the effect of node re- 
moval on the modularity function Q. "Community cores" 
and "bridges" have different effects on community struc- 
ture. When a "community core" is removed, the commu- 
nities become clear. For example, the removal of the node 
"bright" makes the modularity function Q increase by 
0.03, which is the largest increase caused by the removal 
of any single node, while the removal of node "Moon" 
causes Q to decrease by 0.015. These results are av- 
eraged over 20 trials. We can see from our results that 
important nodes (i.e., nodes with large /) affect the com- 
munities considerably. For example, the removal of the 
node "Smart" decreases Q by 0.0152, while the removal 
of the node "Gifted", which seems to be a peripheral 
node, decreases Q by only 0.0048. 

We may also apply our method to social networks, such 
as the scientist collaboration networkfsB'], and neural 
networks, such as the C. elegans neural network[36]. 
We analyzed the largest connected component of each 
network. The scientist collaboration network represents 



scientists whose research centers on the properties of net- 
works of one kind or another. There are 379 vertices, rep- 
resenting scientists who are divided into 12 communities. 
Edges are placed between scientists who have published 
at least one paper together. The neural network of C. 
elegans contains 302 neurons and 2,359 links. This net- 
work is divided into 3 communities, with each node rep- 
resenting a neuron and each link representing a synaptic 
connection between neurons. Here we consider the C. el- 
egans neural network to be undirected. The results are 
shown in Fig. 6. 

In the scientist collaboration network, our centrality 
metric / identifies "group leaders" , such as M. Newman, 
S. Boccaletti, and A. Barabasi. Their ui-scores are not 
very large because they often have some collaboration 
between scientists outside their own communities. We 
can also find so-called "community cores" based on our 
method, such as R. Sole, and "bridge" vertices among 
some communities, such as B. Kahng. As we know, the C. 
elegans neural networks are composed of sensory neurons, 
interneurons and motor neurons. The neurons with high 
centrality metrics often have the most important func- 
tions, and all of them are interneurons, such as AVA, 
AVB, AVD, and AVE. These classes, which synapse 
onto motor neurons in the ventral cord, are among the 
most prominent neurons in the whole nervous system. 
They generally have larger-diameter processes than other 
neurons and have many synaptic connections [36l Ist} . As 
a result, they have larger / than other vertices, while the 
typical w-score in these classes is quite small (smaller 
than 0.05). In the C. elegans neural network, connection 
between communities is more necessary and frequent due 
to some special functions. 



V. APPLICATIONS IN WEIGHTED 
NETWORKS 

Our method can be generalized to weighted networks 
because the adjacency matrix in an undirected weighted 
network is real and symmetric. Thus, in weighted net- 
works, the importance of a node and its role in commu- 
nities are also characterized by its / and w-score. Let us 
first consider an artificial weighted network. We use sim- 
ilarity weight in this weighted network. A higher weight 
means a closer relationship between vertices. At first, 10 
nodes form a complete network and are divided into two 
communities with 5 nodes each. We assign vertices 4 and 
9 as the core of each community, each of which has links 
with weight 2 connecting to vertices within its community 
and weight 0.2 connecting to outside vertices. All other 
intra-connections have weight 1, and all other intercon- 
nections have weight 0.2. Then we introduce vertex 11 as 
the bridge between the two communities. It connects to 
all 10 nodes with weight 1. The index / and w-score for 
each node are given in Tab. 2. The results indicate that 
vertices 4, 9 and 11 are more important than the other 
vertices, while vertex 11 is a "bridge" between these two 
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FIG. 6: The centrality metric I and it)-score for the scientist coUaboration network (a,b). The centrahty metric / and lu-score 
are also calculated in the C. elegans neural network (c,d). 



communities. Our method works quite well in this small 
artificial weighted network. 



TABLE II: Centrality metrics I and w-score in a complete 
weighted network. 



Vertex Label 


I 


ui-score 


4 


0.295 


0.316 


9 


0.295 


0.316 


11 


0.16 


0.00 


others 


0.156 


0.316 



As an example of a real-world weighted network, we 
investigate the collaboration network among scientists 
working at the Santa Fe Institute (the SFI network). 
Here we consider it as a weighted, undirected network. 
Collaboration events between the scientists can be re- 
peated again and again, and a higher frequency of col- 
laboration usually indicates a closer relationship. Fur- 
thermore, weights can be assigned to the scientists' col- 
laboration quite naturally: an article with n authors cor- 
responds to a collaboration act of weight -^^^ between ev- 
ery pair of its authors [385. The results for the SFI collab- 
oration network are sketched in Fig. 7. Vertex diameters 
indicate the community centrality /. The color of each 
vertex is proportional to the index w-score. Red vertices 
behave like "overlapping" nodes or "bridges" between 
communities, and yellow vertices often lie inside their 
own communities. We do not know the specific names; 



I 



0.090 0.105 0.120 0.135 0.150 0.165 0.180 0.195 
w-score 

FIG. 7: Sketch of the SFI scientific collaboration network as 
a weighted, undirected network. It has 118 scientists. Vertex 
diameters indicate the community centrality 7. The color of 
each vertex is proportional to the index ui-score. 



however, we observe that the positions of the large ver- 
tices are just like the "group leaders". Vertices 2, 12 and 
24 are so-called "community cores" in communities be- 
cause their w-scores are quite large. In fact, they are 
the group leaders in the fields of Mathematical Ecology, 
Statistical Physics and Structure of RNA, respectively. 
However, vertices 1, 9 and 11 are the "bridges" between 
communities, and they have relative small w-scores. In- 
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terestingly, the result in the weighted network is different 
from the one in the corresponding unweighted network. 
It can be concluded that the edge weight may affect the 
result. For example, vertex 9 and vertex 11 collaborate 
quite often; this makes both of them quite important in a 
weighted network, while in an unweighted network, nei- 
ther of them is very important to the community struc- 
ture. 



Even in the absence of perturbation, the maximum eigen- 
value of a smaller community can lie inside the cloud of 
non-Perron-Frobenius eigenvalues of the largest commu- 
nity. But, with the understanding that the intent of our 
method is to find the important nodes in the community 
structure, the nodes in very small communities may be 
ignored. Even so, if the community structure is so fuzzy 
that we cannot identify the number of communities, our 
method is not accurate. 



VI. CONCLUSION AND DISCUSSION 

In this paper, we characterize the node importance to 
community structure using the spectrum of the graph. 
The eigenspectrum of the adjacency matrix gives a clear 
indication of the number of "dominant" communities in 



a network|27[. We give a centrality metric based on the 
spectrum of the adjacency matrix of the graph, and it can 
identify the nodes important to the community structure 
in many cases. In addition, we propose an index to dis- 
tinguish the two kinds of important nodes that we term 
"community cores" and "bridges" using the spectrum of 
the graph Laplacian. 

We demonstrate a variety of applications of our 
method to both artificial and real-world networks rep- 
resenting social and neural networks. Our method works 
well in many cases without knowing the exact community 
structure, although the number of communities should be 
known. However, a limitation of this method arises when 
one or more of the communities is much smaller than the 
largest community, or when a community has very sparse 
intra-community connections compared to other commu- 



nities. This may happen when TV; 



small 



< N, 



large \A 



Our method can also be used in weighted networks. 
From our result in the SFI network, it can be inferred 
that edge weight may affect the result. Furthermore, it 
may generalize to directed networks because the Perron- 
Frobenius eigenvalues are often real and positive [HI. 
We have yet to treat the case of directed networks. The 
identification of such key nodes is important and could 
potentially be used to identify the organizer of the com- 
munity in social networks, to develop an immunization 
strategy in an epidemic process, to identify key nodes in 
biological networks and so on. We hope our results may 
be helpful to future research. 
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